Designing patterns for profile HMM search

نویسندگان

  • Yanni Sun
  • Jeremy Buhler
چکیده

MOTIVATION Profile HMMs are a powerful tool for modeling conserved motifs in proteins. These models are widely used by search tools to classify new protein sequences into families based on domain architecture. However, the proliferation of known motifs and new proteomic sequence data poses a computational challenge for search, requiring days of CPU time to annotate an organism's proteome. RESULTS We use PROSITE-like patterns as a filter to speed up the comparison between protein sequence and profile HMM. A set of patterns is designed starting from the HMM, and only sequences matching one of these patterns are compared to the HMM by full dynamic programming. We give an algorithm to design patterns with maximal sensitivity subject to a bound on the false positive rate. Experiments show that our patterns typically retain at least 90% of the sensitivity of the source HMM while accelerating search by an order of magnitude. AVAILABILITY Contact the first author at the address below.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerating Information Retrieval from Profile Hidden Markov Model Databases

Profile Hidden Markov Model (Profile-HMM) is an efficient statistical approach to represent protein families. Currently, several databases maintain valuable protein sequence information as profile-HMMs. There is an increasing interest to improve the efficiency of searching Profile-HMM databases to detect sequence-profile or profile-profile homology. However, most efforts to enhance searching ef...

متن کامل

MetaDomain: A Profile HMM-Based Protein Domain Classification Tool for Short Sequences

Protein homology search provides basis for functional profiling in metagenomic annotation. Profile HMM-based methods classify reads into annotated protein domain families and can achieve better sensitivity for remote protein homology search than pairwise sequence alignment. However, their sensitivity deteriorates with the decrease of read length. As a result, a large number of short reads canno...

متن کامل

جنبه های طراحی شناسنامه بیمار در سیستمهای شخصی سازی اطلاعات سلامت: مرور نظاممند حوزه

Introduction: Tailoring the content of health information to the needs, preferences and abilities of individuals, leads to more informed and empowered health consumers. Computerized tailoring of Health Information requires patient’s characteristics. A user profile consists of personal data which are basic components in designing computer-tailoring systems. The present study aimed to ident...

متن کامل

Accelerated Profile HMM Searches

Profile hidden Markov models (profile HMMs) and probabilistic inference methods have made important contributions to the theory of sequence database homology search. However, practical use of profile HMM methods has been hindered by the computational expense of existing software implementations. Here I describe an acceleration heuristic for profile HMMs, the "multiple segment Viterbi" (MSV) alg...

متن کامل

Extraction of hidden Markov model representations of signal patterns in DNA sequences.

We have developed a method to extract the signal patterns in DNA sequences. In this method, the Genetic Algorithm (GA) and Baum-Welch algorithm are used to obtain the best Hidden Markov Model (HMM) representations of the signal patterns in DNA sequences. The GA is used to search the best network shapes and the initial parameters of the HMMs. Baum-Welch algorithm is used to optimize the HMM para...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 23 2  شماره 

صفحات  -

تاریخ انتشار 2007